Search CORE

7 research outputs found

Machine Learning Techniques for Evolving Threats

Author: Jordaney Roberto
Publication venue
Publication date: 01/01/2019
Field of study

TESSERACT:Eliminating Experimental Bias in Malware Classification across Space and Time

Author: Cavallaro Lorenzo
Jordaney Roberto
Kinder Johannes
Pendlebury Feargus
Pierazzi Fabio
Publication venue
Publication date: 01/01/2019
Field of study

Is Android malware classification a solved problem? Published F1 scores of up to 0.99 appear to leave very little room for improvement. In this paper, we argue that results are commonly inflated due to two pervasive sources of experimental bias: "spatial bias" caused by distributions of training and testing data that are not representative of a real-world deployment; and "temporal bias" caused by incorrect time splits of training and testing sets, leading to impossible configurations. We propose a set of space and time constraints for experiment design that eliminates both sources of bias. We introduce a new metric that summarizes the expected robustness of a classifier in a real-world setting, and we present an algorithm to tune its performance. Finally, we demonstrate how this allows us to evaluate mitigation strategies for time decay such as active learning. We have implemented our solutions in TESSERACT, an open source evaluation framework for comparing malware classifiers in a realistic setting. We used TESSERACT to evaluate three Android malware classifiers from the literature on a dataset of 129K applications spanning over three years. Our evaluation confirms that earlier published results are biased, while also revealing counter-intuitive performance and showing that appropriate tuning can lead to significant improvements.Comment: This arXiv version (v4) corresponds to the one published at USENIX Security Symposium 2019, with a fixed typo in Equation (4), which reported an extra normalization factor of (1/N). The results in the paper and the released implementation of the TESSERACT framework remain valid and correct as they rely on Python's numpy implementation of area under the curv

arXiv.org e-Print Archive

Royal Holloway - Pure

King's Research Portal

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Transcend:Detecting Concept Drift in Malware Classification Models

Author: Cavallaro Lorenzo
Dash Santanu
Jordaney Roberto
Nouretdinov Ilia
Papini Davide
Sharad Kumar
Wang Zhi
Publication venue: USENIX
Publication date: 01/01/2017
Field of study

Building machine learning models of malware behavior is widely accepted as a panacea towards effective malware classification. A crucial requirement for building sustainable learning models, though, is to train on a wide variety of malware samples. Unfortunately, malware evolves rapidly and it thus becomes hard—if not impossible—to generalize learning models to reflect future, previously-unseen behaviors. Consequently, most malware classifiers become unsustainable in the long run, becoming rapidly antiquated as malware continues to evolve. In this work, we propose Transcend, a framework to identify aging classification models in vivo during deployment, much before the machine learning model’s performance starts to degrade. This is a significant departure from conventional approaches that retrain aging models retrospectively when poor performance is observed. Our approach uses a statistical comparison of samples seen during deployment with those used to train the model, thereby building metrics for prediction quality. We show how Transcend can be used to identify concept drift based on two separate case studies on Android andWindows malware, raising a red flag before the model starts making consistently poor decisions due to out-of-date training

Royal Holloway - Pure

UCL Discovery

King's Research Portal

Enabling Fair ML Evaluations for Security

Author: Cavallaro Lorenzo
Jordaney Roberto
Kinder Johannes
Pendlebury Feargus
Pierazzi Fabio
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/10/2018
Field of study

Crossref

Royal Holloway - Pure

King's Research Portal

Prescience:Probabilistic Guidance on the Retraining Conundrum for Malware Detection

Author: Allix Kevin
Amos Brandon
Jordaney Roberto
Lambrou Antonis
Menon Aditya Krishna
Saracino A.
Tsymbal Alexey
Vovk Vladimir
Vovk Vladimir
Vovk Vladimir
Zhou Yajin
Publication venue
Publication date: 28/10/2016
Field of study

Malware evolves perpetually and relies on increasingly sophisticatedattacks to supersede defense strategies. Datadrivenapproaches to malware detection run the risk of becomingrapidly antiquated. Keeping pace with malwarerequires models that are periodically enriched with freshknowledge, commonly known as retraining. In this work,we propose the use of Venn-Abers predictors for assessingthe quality of binary classification tasks as a first step towardsidentifying antiquated models. One of the key bene-fits behind the use of Venn-Abers predictors is that they areautomatically well calibrated and offer probabilistic guidanceon the identification of nonstationary populations ofmalware. Our framework is agnostic to the underlying classificationalgorithm and can then be used for building betterretraining strategies in the presence of concept drift. Resultsobtained over a timeline-based evaluation with about 90Ksamples show that our framework can identify when modelstend to become obsolete

Crossref

Royal Holloway - Pure

King's Research Portal

Misleading Metrics:On Evaluating Machine Learning for Malware with Confidence

Author: Cavallaro Lorenzo
Jordaney Roberto
Nouretdinov Ilia
Papini Davide
Wang Zhi
Publication venue
Publication date: 01/02/2016
Field of study

King's Research Portal

Conformal Clustering and Its Application to Botnet Traffic

Author: Cavallaro Lorenzo
Cherubin Giovanni
Gammerman Alexander
Jordaney Roberto
Nouretdinov Ilia
Papini Davide
Wang Zhi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/04/2015
Field of study

King's Research Portal